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1. Introduction 

Markov Chain Monte Carlo (MCMC) methods are well-known tools for sampling a target dis- 
tribution vr known up to a multiplicative constant. MCMC algorithms sample tt by constructing 
a Markov chain admitting tt as unique invariant distribution. A canonical example is the the 
Metropolis-Hastings algorithm [27^ I20j: given the current value X„ of the chain {Xj, j > 0}, it con- 
sists in proposing a move Y^+i under a proposal distribution Q{Xn, •)■ This move is then accepted 
with probability 

an = 1 A7r(y„+i)Q(y„+i,X„)/[^(X„)Q(X„,y„+i)] , 

where a f\h stands for min(a, 6); otherwise, Xn+i = Xn- 

It is known that the efficiency of MCMC methods depends upon the choice of the proposal 
distribution |31] . For example, when sampling multi-modal distributions, a Metropolis-Hastings 
algorithm with Q{Xn, •) equal to a Gaussian distribution centered in Xn tends to be stuck in one 
of the modes. So the convergence of such an algorithm will be slow, and the target distribution will 
not be correctly approximated unless a huge number of points is sampled. 

Efficient implementations of MCMC rely on a strong expertise of the user in order to choose a 
proposal kernel and, more generally, design parameters adapted to the target vr. 

This is the reason why adaptive and interacting MCMC methods have been introduced. Adaptive 
MCMC methods consist in choosing, at each iteration, a transition kernel Pg among a family {Pg, 9 S 
0} of kernels with invariant distribution vr: the conditional distribution of Xn+i given the past is 
Pg„{Xn,-) where the parameter On is chosen according to the past values of the chain {X„,n > 
0}. From the pioneering Adaptive Metropolis algorithm of |19j . many adaptive MCMC have been 
proposed and successfully applied (see the survey papers by [5], [31], [6] for example). 

Interacting MCMC methods rely on the (parallel) construction of a family of processes with 
distinct stationary distributions; the key behind these techniques is to allow interactions when 
sampling these different processes. At least one of these processes has tt as stationary distribution. 
The stationary distributions of the auxiliary processes are chosen in such a way that they have 
nice convergence properties, hoping that the process under study will inherit them. For example, 
in order to sample multi-modal distributions, a solution is to draw auxiliary processes with target 
distributions equal - up to the normalizing constant - to tempered versions 7r^/^% Tj > 1. This 
solution is the basis of the parallel tempering algorithm |18j . where the states of two parallel chains 
are allowed to swap. Following this tempering idea, different interacting MCMC algorithms have 
been proposed and studied so far [H [Til [HI [H]. 

The Equi-Energy sampler of Kou, Zhou and Wong |22j is an example of such interacting MCMC 
algorithms. K processes are sampled in parallel, with target distributions (proportional to) tt^'', 
1 = > Pk-1 > ■ ■ ■ > Pi- The first chain Y^'^^ = {Yn^\n > 0} is usually a Markov chain; then 
y^'^^ is built from Y^^~^^ as follows: with a fixed probability e, the current state Yn^^ is allowed to 
jump onto a past state of the auxiliary chain {Y^^ < n}, and with probability (1 — e), Yn^^ 
is obtained using a "local" MCMC move (such as a random walk Metropolis step or a Metropolis- 
adjusted Langevin step). This mechanism includes the computation of an acceptance ratio so that 
the chain Y^'^^ will have tt^'' as target density. As the acceptance probability of such a jump could 
be very low, only jumps toward selected past values of Y^''~^\ namely those with an energy close 

(k) 

to that of the current state Yd , are allowed. This selection step allows higher acceptance rates of 
the jump, and a faster convergence of the algorithm is expected. 
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The Equi-Energy sampler has many design parameters: the interacting probabihty e, the number 
K of parallel chains, the temperatures Tk = k £ {1,. . . ,K} and the selection function. It is 

known that all of these design parameters play a role on the efficiency of the algorithm. [22] suggest 
some values for all these parameters, designed for practical implementation and based on empirical 
results on some simple models. [3] discuss the choice of the interacting probability e in similar 
contexts; [8j discuss the choice of the temperatures of the chains for the Parallel Tempering 
algorithm. Recently, an algorithm combining parallel tempering with equi-energy moves have been 
proposed by [10]. 

In this paper, we discuss the choice of the energy rings and the selection function, when the jump 
probability e, the number K of auxiliary processes and the temperatures are fixed. We introduce 
a new algorithm, called Adaptive Equi-Energy sampler in which the selection function is defined 
adaptively based on the past history of the sampler. We also address the convergence properties of 
this new sampler. 

Different kinds of convergence of adaptive MCMC methods have been addressed in the literature: 
convergence of the marginals, the law of large numbers (LLN) and central limit theorems (CLT) 
for additive functionals (see e.g. |29] for convergence of the marginals and weak LLN of general 
adaptive MCMC, [1] or [3l] for LLN and CLT for adaptive Metropolis algorithms, [IB] and [IT] for 
convergence of the marginals, LLN and CLT for general adaptive MCMC algorithms - see also the 
survey paper by [6]). 

There are quite few analysis of the convergence of interacting MCMC samplers. The original proof 
of the convergence of the Equi-Energy sampler in [22] (resp. [7]) contains a serious gap, mentioned 
in [7] (resp. [2]). [3] established a strong LLN of a simplified version of the Equi-Energy sampler, in 
which the number of levels is set to = 2 and the proposal during the interaction step are drawn 
uniformly at random in the past of the auxiliary process. Finally, Fort, Moulines and Priouret [16] 
established the convergence of the marginals and a strong LLN for the same simplified version of 
the Equi-Energy sampler (with no selection) but have removed the limitations on the number of 
parallel chains. 

The paper addresses the convergence of an interacting MCMC sampler in which the proposal 
are selected from energy rings which are constructed adaptively at each levels. In this paper, we 
obtain the convergence of the marginals and a strong LLN of a smooth version of the Equi-Energy 
sampler and its adaptive variant. We illustrate our results in several difficult scenarios such as 
sampling mixture models with "well-separated" modes and motif sampling in biological sequences. 
The paper is organized as follows: in Section [21 we derive our algorithm and set the notations 
that are used throughout the paper. The convergence results are presented in Section [3l Finally, 
Section [4] is devoted to the application to motif sampling in biological sequences. The proofs of the 
results are postponed to the Appendix. 



2.1. Notations. Let (X, X) be a measurable Polish state space and P be a Markov transition kernel 
on (X, A"). P operates on bounded functions / on X and on finite positive measures fi on X: 



2. Presentation of the algorithm 
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The n-iterated transition kernel P", n > is defined by: 

P-{x,A) = I P^-\x,dy)P{y,A) = J P{x,dy)P^-Hy, A) ; 

by convention, P^{x,A) is the identity kernel. For a function 1/ : X — t- [1,+oo[, we denote by \ f\v 
the V-norm of a function / : X — )• M: 

1,1 1/(^)1 
\f\v = sup—— . 

a;GX V{X) 

liV = l, this norm is the usual uniform norm. Let Cy = {/ : X — t- M, \ f\v < +oo}. We also define 
the V-distance between two probability measures fii and fi2 by: 

||M1-/^2||v= sup - ^2(/)| • 

f,\f\v<l 

When V = 1, the V-distance is the total-variation distance and will be denoted by \\fii — fJ-2\\TV- 

Let (6, 7") be a measurable space, and {Pg,9 G 0} be a family of Markov transition kernels; 
Q can be finite or infinite dimensional. It is assumed that for all A £ X, {x,9) — )■ Pg{x,A) is 
{X (g) T\B{[0, l]))-measurable, where B{[0, 1]) denotes the Borel cr-field on [0, 1]. 

2.2. The Equi-Energy sampler. Let vr be the probability density of the target distribution with 
respect to a dominating measure fi on (X, X). In many applications, vr is known up to a multiplicative 
constant; therefore, we will denote by vr^ the (unnormalized) density. 

We denote by P the Metropolis-Hastings kernel with proposal density kernel q and invariant 
distribution vr defined by: 

Pix, A) = r{x, y)qix, y)n{dy) + 1a{x) j {I - r(x, y))q{x, y)fJ,idy) , 

where (x, y) i— )■ r(x, y) is the acceptance ratio given by 

'^{y)q{y,x) 



r{x, y) = 1 A 



Tr{x)q{x,y) ' 

The Equi-Energy (EE) sampler proposed by [22] exploits the fact that it is often easier to sample 
a tempered version vr'^, < /3 < 1, of the target distribution than vr itself. This is why the algorithm 
relies on an auxiliary process {Yn,n > 0}, run independently from {Xn} and admitting vr'' as 
stationary distribution (up to a normalizing constant). This mechanism can be repeated yielding to 
a multi-stages Equi-Energy sampler. 

We denote by K the number of processes run in parallel. Let £ £ (0, 1). Choose K temperatures 
Ti > ••• > Tft- = 1 and set = l/T^; and K MCMC kernels < k < K} such that 

TT^kpik) — yp/3fe processes Y^^^ = {Yn''\n > 0}, I < k < K, are defined by induction on the 
probability space (17, J-, P). The first auxiliary process Y^^^ is a Markov chain, with P^^^ as transition 
kernel. Given the auxiliary process Y^'^^^^ up to time n, {Ym < n}, and the current state 

(k) (k) 

of the process of level k, the Equi-Energy sampler draws as follows: 

• (Metropohs-Hastings step) with probability I — £, i^i+i ~ P'''^\Yn''\ •). 

• (equi-energy step) with probability e, the algorithm selects a state Zn+i from the auxiliary 
process having an energy close to that of the current state. An acceptance-rejection ratio is 
then computed and if accepted, Y^'^^ = Zn+i; otherwise, Y^'^^ = Yn''\ 
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In practice, [22] only apply the equi-energy step when there is at least one point in each ring. In 
|22j . the distance between the energy of two states is defined as follows. Consider an increasing 
sequence of positive real numbers 

Co = < < • • • < ^5 = +00 . (1) 

If the energies of two states x and y belong to the same energy ring, i.e. if there exists 1 < ^ < 5 
such that ^i-i < TTu{x),-Ku{y) < then the two states are said to have "close energy". The choice 
of the energy rings is most often a difficult task. As shown in Figure [Sjright] , the Equi-Energy 
sampler is inefficient when the energy rings are not appropriately defined. The efficiency of the 
sampler is increased when the variation of vr^ in each ring is small enough so that the equi-energy 
move is accepted with high probability. 

2.3. The Adaptive Equi-Energy sampler. We propose to modify the Equi-Energy sampler by 
adapting the energy rings "on the fly" , based on the history of the algorithm. Our new algorithm, 
so called Adaptive Equi-Energy sampler (AEE) is similar to the Equi-Energy sampler of |22j except 
for the equi-energy step, which relies on adaptive boundaries of the rings. For the definition of the 
process Y^^\ k >2, adaptive boundaries computed from the process Y^^~^^ are used. 

For a distribution 9 in @, denote by S,g^i, i £ {1, ■ ■ ■ , S — 1} the bounds of the rings, computed 
from r.v. with distribution 9; by convention, S^g^ = < ^0^1 < • • • < ^e,s-i < S,e,s = +00. Define 
the associated energy rings Hg^£ = [£,9/-i,S,e,£) for i £ {I,-"" ^S}. We consider selection functions 
gd{x,y) of the form 

S 

9e{x,y) = '^hg^i{x)hg^i{y) , he,i{x) = {1 - d{TTuix), He^())^ , (2) 
e=i 

where d{7ru{x), He^i) measures the distance between 7ru{x) and the ring Hg^£. By convention hg^i = 

if Hq£ = 0. We finally introduce a set of selection kernels {Kg''\9 G 0} for all k G {2,--- ,K} 
defined by 

K^^^ix A) - f a^'Vx v) 9e{x,y)9{dy) . . fr (k). gg{x,y)9{dy) 

' ^ '^"A ^'^'^75e(x,^)^(dz) ^ ^'''y'^fgeix,z)9idz)' ^"^^ 



where 



{X,y) — 1 /\ \ _o_a r 



a^\x,y) = 1 A ( " o_o ,T\'/"'r~{n)/{ ) • (4) 
y-n-Pk Pk-i(^x) j ge{y,z)9{dz)J 

K^'^ is associated to the equi-energy step when defining Y^^^ : a draw under the selection kernel pro- 
portional to gQ{x,y)9{dy) is combined with an acceptance-rejection step. The acceptance-rejection 

step is defined so that when 9 oc 7r^'=-i, tt^'^ is invariant for [22] . 

This equi-energy step is only allowed when each ring contains at least one point (of the auxiliary 
process y^'^"^) up to time n). We therefore introduce, for all positive integer m, the set 0^: 



e e : ^ < inf j ge{x,y)9{dy)^ . 



(5) 



With these notations, AEE satisfies for any n > and k £ {!,■ ■ ■ , K}, 

q{fc-l) 



nf{Y^%M'^] = nf{y!:%)\Y^'\Yi'-'\ l<m<n] = P^'Uf{Y('^) , (6) 
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where {Fn^ , n > 0} is the filtration defined by Fn^ = a ; the tran- 

sition kernel is given by P^^^ = PW and for k > 2, 

Ptt^ = (1 - el,eu„,, e.)^'^') + eleey^^^_, e„ i^f ; 

(k) 

and Oil is the empirical distribution 

^i'^ = -EV)' ,K},n>l. (7) 

m=l 

Different functions d can be chosen. For example, the function given by 

dMx\ He,) = MHe.Mx)) = { ; '^^llll (8) 

yields to a selection function go such that g0{x,y) = 1 iff x,y are in the same energy ring and 
ge{x,y) = otherwise. In this case, the acceptance-rejection ratio a~Q\x,y) is equal to 1 A 
{Til^k-Pk-i(jj'^ l'!^Pk-Pk-i(^x)) upon noting that by definition of the proposal kernel, the points x and 
y are in the same energy ring. By using this "hard" distance during the equi-energy jump, all the 
states of the auxiliary process having their energy in the same ring as the energy of the current state 
are chosen with the same probability, while the other auxiliary states have no chance to be selected. 
Other functions d could be chosen, such as "soft" selections of the form 

d{TTuix),Hg i) = - min |7r„(x) - y\ , (9) 

(k) 

where r > is fixed. With this "soft" distance, given a current state 1^ , the probability for each 
auxiliary state Y^'' ^\ i < n, to be chosen is proportional to g (k-i){Yn''\Y^'' ^'^). Then, the "soft" 
selection function allows auxiliary states having an energy in a r-neighborhood of the energy ring of 
T^uiYn^'') to be chosen, as well as states having their energy in this ring. Nevertheless, this selection 

(k) 

function yields an acceptance-rejection ratio which may reveal to be quite costly to evaluate. 

The asymptotic behavior of AEE will be addressed in Section [31 The intuition is that when the 
empirical distribution ^i^^ of the auxiliary process of order k — 1 converges (in some sense) to 
oi^ ^\ the process {Yn^\n > 0} will behave (in some sense) as a Markov chain with transition 
kernel Pj^S-D- 

2.4. A toy example (I). To highlight the interest of our algorithm, we consider toy examples: 
the target density vr is a mixture of M'^-valued GaussiarQ . This model is known to be difficult, as 
illustrated (for example) in [6j for a random walk Metropolis-Hastings sampler (SRWM), an EE- 
sampler and a parallel tempering algorithm. Indeed, if the modes are well separated, a Metropolis- 
Hastings algorithm using only "local moves" is likely to remain trapped in one of the modes for 
a long-period of time. In the following, AEE is implemented with ring boundaries computed as 
described in Section 13. 3i 

Figure dK a) displays the target density tt and the simulated one for three different algorithms 
(SRWM, EE and AEE) in one dimension. The histograms are obtained with 10^ samples; for EE 



-'^MATLAB codes for AEE are available at the address [http:/ /perso.telecom-paristech.fr/~scIireck/index.html| 
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and AEE, the probability of interaction is e = 0.1, the number of parallel chains is equal to X = 5 
and the number of rings is S = 5. For the adaptive definition of the rings in AEE, we choose the 
"hard" selection ^ and the construction of the rings defined in Section 13.31 In the same vein. 
Figure [2] displays the points obtained by the three algorithms when sampling a mixture of two 
Gaussian distributions in two dimensions. As expected, in both figures, SRWM never explores one 
of the modes, while EE and AEE are far more efficient. 
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Figure 1. Comparison of SRWM (left), EE (center) and AEE (right) for a Gaussian 
mixture in one dimension 
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Figure 2. Comparison of the algorithms for a Gaussian mixture in two dimensions: 
(from left to right) the true density, SRWM, EE and AEE. 

To compare EE and AEE in a more challenging situation, we consider the case of a mixture with 
two components in ten dimensions. We run EE and AEE with K = 3 parallel chains with respective 
temperatures Ti = 1,T2 = 9, T3 = 60, the probability of jump e is equal to 0.1, and the number 
of rings is S = 50. Both algorithms are initialized in one of the two modes of the distribution. 
For the Metropolis-Hastings step, we use a Symmetric Random Walk with Gaussian proposal; the 
covariance matrix of the proposal is of the form c I where c is calibrated so that the mean acceptance 
rate is approximatively 0.25. Figure [3] displays, for each algorithm, the L^-norm of the empirical 
mean, averaged over 10 independent trajectories, as a function of the length of the chains. 

In order to show that the efficiency of EE depends crucially upon the choice of the rings, we 
choose a set of boundaries so that in practice, along one run of the algorithm, some of the rings are 
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never reached. Figure El^a) compares EE and AEE in this extreme case: even after 2 x 10^ iterations, 
all of the equi-energy jumps are rejected for the (non-adaptive) EE, and the algorithm is trapped 
in one of the modes. This does not occur for AEE, and the L^-error tends to zero as the number of 
iterations increases. This illustrates that our adaptive algorithm avoids the poor behaviors that EE 
can have when the choice of its design parameters is inappropriate. 

We now run EE in a less extreme situation: we choose (fixed) energy rings so that the sampler can 
jump more easily than in the previous experiment between the modes. Figure [3l[|b) illustrates that 
the adaptive choice of the energy rings speeds up the convergence, as it makes the equi-energy jumps 
be more often accepted. To have a numerical comparison, the equi-energy jumps were accepted about 
ten times more often for AEE than for EE. 




Figure 3. Error of EE (dashed line) and AEE for two different target densities in ten dimensions. 



2.5. Toy example (II). For a better understanding on how our algorithm behaves. Figure ID (a) 
displays the evolution of the ring bounds used in the definition of y . In this numerical application, 
the target density is a mixture of two Gaussian distributions in one dimension; EE and AEE are 
run with K = 5 chains, S = 5 rings and e = 0.1, for a number of iterations varying from to 10^. 
As expected, the ring bounds become stable after a reasonable number of iterations. Moreover, we 
observed that the (non-adaptive) EE run with the rings fixed to the limiting values obtained with 
AEE behaves remarkably well. 

Finally, to have an idea on the role played by e. Figure 131(b) displays the average error of AEE 
for a mixture of two Gaussian distributions in one dimension, after 2 x 10^ iterations and for 100 
independent trajectories when e is varying from to 1. If e is too small, AEE is not mixing well 
enough, and if e is too large, the algorithm jumps easily from one mode to another but does not 
explore well enough each mode, which explains the 'u' shape of the curve. This experiment suggests 
that there exists an optimal value for e, but to our best knowledge, the optimal choice of this design 
parameter is an open problem. 

3. Convergence of the Adaptive Equi-Energy sampler 

In this section, the convergence of the JC-stages Adaptive Equi-Energy sampler is established. In 
order to make the proof easier, we consider the case when the distance function d in the definition 
of the selection function ([2]) is given by ([9]) . 
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Figure 4. (a): Evolution of the ring bounds; (b): Averaged error of AEE as a 
function of e. 



[16] provide sufficient conditions for the convergence of the marginals and the strong LLN (s-LLN) 
of interacting MCMC samplers. We use their results and show the convergence of the marginals i.e. 



lim E 



= vr(/) 



for any continuous bounded functions /. Note that this implies that this limit holds for any indicator 
function f = 1a such that ¥{dA) = where dA denotes the boundary of A [12^ Theorem 2.1]. We 
then establish the s-LLN: for a wide class of continuous (un)bounded functions /, 



lim 1 f^^^""^) = <f) 



a.s. 



n— >-cxD n 



m=0 



3.1. Assumptions. Our results are established for target distributions vr satisfying 

El (a) vr is the density of a probability distribution on the measurable Polish space (X, X) and 
supx vr < oo and for any s G (0, 1], / vr'^(j;) dx < oo. 
(b) vr is continuous and positive on X. 

Usually, the user knows vr up to a normalizing constant: hereafter, vr^ will denote this available 
(unnormalized) density. 

As in ^16j, we first introduce a set of conditions that will imply the geometric ergodicity of the 
kernels Pg , and the existence of an invariant probability measure for p;^^' (see conditions E12B. 
We finally introduce conditions on the boundaries of the adaptive energy rings (see conditions El3]) . 
Examples of boundaries satisfying El3]and computed from quantile estimators are given in Section [3^ 
(see also [35] for stochastic approximation-based adapted boundaries). 

Convergence of adaptive and interacting MCMC samplers is addressed in the literature by assum- 
ing containment conditions and diminishing adaptations (so called after |29j ) . Assumptions El2] is 
the main tool to establish a (generalized) containment condition. In our algorithm, the adaptation 
mechanism is due to (a) the interaction with an auxiliary process and (b) the adaption of the rings. 
Therefore, assumptions El2] and El3] are related to the diminishing adaptation condition (see e.g. 
Lemma iB.Gl in Section [B.sp . 

E2 For each k e {1, . . . , K}: 

(a) P^''^ is a irreducible transition kernel which is Feller on (X, X) and such that vr^'^P^'^^ = 



vr' 
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(b) There exist G (0,1), hk < +00 and e {Q,Tk-iPk-i/ h) such that P''^'>Wk < 
^kWk + hk with 

hi 



Wk(x) 



-Tk 



(10) 



/ 7rPfe(x) 

by convention, to/3o = /3i- 
(c) For all p G (0, supx tt), the sets {vr > p} are 1-small for P^^\ 
Note that by definition of r^, and E Hifel M/fc+i S £vKfc and J" Tyfc(x)7r''* (x)d2; < 00. 

El2] is satisfied for example if for each /c, P^^^ is a symmetric random walk Metropolis Hastings 
kernel; and vr is a sub-exponential target density [30', f^T]. 

In our algorithm, y*^^^ is a Markov chain with transition kernel P^^\ As discussed in |28j [chapters 
13 and 17], El2]is sufficient to prove ergodicity and a s-LLN for Y^^\ El2]also implies uniform Wi- 
moments for Y^^\ These results, which initializes our proof by recurrence of the convergence for 
the process number K, is given in Proposition 13.11 Define the probability distributions 

^ (x) 



(dx) 
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Proposition 3.1. Assume M and E [vFi(yJ^^) 



< 00. Then, 



(a) For all hounded measurahle functions f, lim„_!.ooJE 

(h) 9i^\W2) < +00, and for any measurahle function f in Cwi , huin 

< c«. 



(1) 



(/)■ 

nil) 
>oo C'ri, 



(/) 



3(1) 



(c) sup,e[h^i (yi')) 

E3 (a) For any k e {I, . . . ,K - 1}, infe^^i^... ^s-i} I \w /v) ^?^(dy) > 0. 

(b) For any A: G {1, . . . , - 1} and £ G {1, • • • ,5-1}, lim„^oo (>gW^i ' 

(c) There exists F > such that for any k £ {1, . . . , K — 1}, any £ £ {1, 



(/) a.s. 



(fc) 



n ' 



(fe) 



= w.p.l 

— 1}, and 



< 00 w.p.l. 



any 7 G (0, F), limsup^ 
Note that by definition of hg^i (see ([2])) 

heAv) ^(dy) > 0{{y : ^„(y) G He,e}) • 



(12) 

Condition H3E1 states that the rings {H (k) .^,n > 0} converge to H (k) w.p.l; therefore, El3fal is 
satisfied as soon as the limiting rings are of positive probability under the distribution of nu{Z) 



when Z ~ oi'^^ 

When the energy bounds are fixed, the conditions El3Ellc1are clearly satisfied and El3fel holds under 
convenient choice of the rings. We will discuss in Section 13.31 how to check the condition El3] with 
adaptive energy bounds. 

fk) 

3.2. Convergence results. Proposition 13.21 shows that the kernels Pg satisfy a geometric drift 

inequality and a minorization condition, with constants in the drift independent of 6 for 6 G @m 

{@rn being defined in ([5])). The proof is in Appendix lA.ll 

Proposition 3.2. Assume i Hti and For all k £ {1, ... , K}: 



12 



AMANDINE SCHRECK, GERSENDE FORT AND ERIC MOULINES 



(a) There exist G (0, 1) and < +oo such that for all m> 1 and any 6 £ Qm, 

Pt^Wk<~XkWk + bkmd{Wk) . (13) 

For all p £ (0, supx^r) and all 6 G Um®m' ^^^^ {'^ — P) ^'^^ 1-small for P^^^ and the 
minorization constants depend neither upon 9 nor on m. 

(h) For all 9 G Um®"!; there exists a probability measure tt^'^'' invariant for Pq^"* ■ In addition, 
T^f\Wk) < bk{l - Afe)-i m9{Wk) for 9 E 9^. 

Theorem 13.31 is proved in Section |Bl Theorem I3.3r [ai) shows that there exists > 1 such that 
w.p.l, for all n large enough 9n belongs to some Qrui,- Note that in [2j, a s-LLN for the Equi-Energy 
sampler is established by assuming that there exists a deterministic positive integer m such that 

(k) 

W.p.l, 9n G ©m for any n. Such a condition is quite strong since roughly speaking, it means that 
after n steps (even for small n), all the rings contain a number of point which is proportional to n, 
w.p.l. This is all the more difficult to guarantee in practice, that the rings have to be chosen prior 
to any exploration of vr. Our approach allows to relax this strong condition. 

The convergence of the marginals and the law of large numbers both require the convergence 
in n {k fixed) of {7r^^j!^^^(/), n > 0} for some functions /. Such a convergence is addressed in 

Theorem 13. 3lfbj) . We will then have the main ingredients to establish the convergence results for the 
processes yw, k>l. 

Theorem 3.3. Assume M M Mand E[Wfe(yJ^^)] < oo for all ke {!,■■■ ,K}. 

(a) There exists rrii, > 1 such that for all k G {1, . . . , K — 1} 

yq>ln>q j 

(b) For any k £ {1, • • • , K}, any a G (0, 1) and any continuous function f £ I^W^j 

hm vr^S^,, (/)=#(/), w.p.l. 

(c) For any € {1, • • • , and for all bounded continuous function / : X — t- M, linin— >oo 

ef\f). 

(d) Let a G (0, A 1). For any k £ {!,■ ■ ■ , K} and for all continuous function f in Cw^ 

hm -^/(yW)=0f (/) P-a.s. . 
m=l 

Observe that, for the process {Y^^\k S N}, the family of functions for which the law of large 
numbers holds depends (i) upon F given by EEjSlJcj) i.e. in some sense, depends upon the adaptation 
rate; and (ii) the temperature ladder. In the case can be chosen arbitrarily close to /3i//3fc for 
any k (see comments after |2H Theorem 4.1 and 4.3]), this family of functions only depends upon 
r and the lowest inverse temperature : it is all the more restrictive than /3i is small. 

To our best knowledge, we are the first to prove such convergence results for AEE (and EE): 
previous works [El [3] consider the simpler case when there is no selection i.e. ge{x,y) = 1. 
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3.3. Comments on Assumption Ej3l We propose to choose the adaptive boundaries ^e^i as the 
P£-quantile of the distribution of t^u{Z) when Z is sampled under the distribution 0. This section 
proves that empirical quantiles of regularly spaced orders are examples of adaptive boundaries ^^(fc) ^ 

satisfying EjSl Let Fg be the cumulative distribution function (cdf) of the r.v. iTu{Z) when Z ^ 6: 



Fe{x) = j l{,r„(z)<x}^'(d2:) , X G [0,oo) 



We denote the quantile function associated to iTuiZ) by: 

Fg-^(p) = inf{x > 0, Fg{x) >p} Vp > ; ^<r^(0) = . 

With this definition, for < pi < • • • < ps-i < 1, we set ^e^^ =^ F^^^ [pi]- 

With this choice of the boundaries, the condition El3fel holds: by (fT2]) . El3fen is satisfied because vr 
is continuous. The conditions E l^El lcl require the convergence of the quantile estimators and a rate 
of convergence of the variation of two successive boundaries. To prove such conditions, we use an 
Hoeffding-type inequality. 

Proposition 3.4. Assume 
(i) The cumulative distribution function F^(i) where oi^^ is given by Ul\) . is dijjerentiable with 

positive derivative on F^^^-^{{0, 1)). 
(a) there exists W such that Y^^^ is a W -uniformly ergodic Markov chain with initial distribution 
satisfying E Y^^^ < oo. 

Then hold with T = 1/2 and K = 2. 

The proof is in Section fB.SI Extensions of Proposition 13 . 41 to the case when Y^^^ is not a uniformly 
ergodic Markov chain is, to our best knowledge, an open question. Therefore, our convergence result 
of AEE when the boundaries are the quantiles defined by inversion of the cdf of the auxiliary process 
applies to the 2-stage level and seems difficult to extend to the i^-stage, K > 2. 

We proved recently in [35] that when the quantiles are defined by a stochastic approximation 
procedure, the conditions B jbh cl hold even under very weak conditions on the auxiliary Y^^\ k > 2. 
In this case, the convergence of the ET-level AEE with i^T > 2 is established. 

4. Application to motif sampling in biological sequences 

One of the challenges in biology is to understand how gene expression is regulated. Biologists have 
found that proteins called transcription factors play a role in this regulation. Indeed, transcription 
factors bind on special motifs of DNA and then attract or repulse the enzymes that are responsible 
of transcription of DNA sequences into proteins. This is the reason why finding these binding motifs 
is crucial. But binding motifs do not contain deterministic start and stop codons: they are only 
random sequences that occurs more frequently than expected under the background model. 

Several methods have been proposed so far to retrieve binding motifs ^36» i24j, c|j ^ which yields to a 
complete Bayesian model |25j . Among the Bayesian approach, one effective method is based on the 
Gibbs sampler [23] - it has been popularized by software programs [26l[33]. Nevertheless, as discussed 
in [22\ . it may happen that classical MCMC algorithms are inefficient for this Bayesian approach. 
Therefore, [22] show the interest of the Equi-Energy sampler when applied to this Bayesian inverse 
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problem; more recently, |32j proposed a Gibbs-based algorithm for a similar model (their model 
differs from the following one through the assumptions on the background sequence). 

We start with a description of our model for motif sampling in biological sequences - this section 
is close to the description in [22] but is provided to make this paper self-contained. We then 
apply AEE and compare it to the Interacting MCMC of [16^ Section 3] (hereafter called I-MCMC), 
and to a Metropolis-Hastings algorithm (MH). Comparison with Gibbs-based algorithms (namely 
BioProspector and AlignACE) can be found in the paper of [22j. 

The available data is a DNA sequence, which is modeled by a background sequence in which 
some motifs are inserted. The background sequence is represented by a vector S = (si, S2, ■ ■ ■ ,sl) of 
length L. Each element Sj is a nucleotide in {A, C, G, T}; in this paper, we will choose the convention 
Si G {1, 2, 3, 4}. The length w of a motif is assumed to be known. The motif positions are collected 
in a vector A = (oi, . . . , ai,), with the convention that Oj = j iff the nucleotide Si is located at 
position number j of a motif; and = iff Sj is not in the motif. The goal of the statistical analysis 
of the data S is to explore the distribution of A given the sequence S. We now introduce notations 
and assumptions on the model in order to define this conditional distribution. 

We denote by po the probability that a sub-sequence of length z/; of S is a motif. It is as- 
sumed that the background sequence is a Markov chain with (deterministic) transition matrix 
^^0 = {vo{i, j)}i<i,j<4: on {I,-- - ,4}; and the nucleotide in a sequence are sampled from a multi- 
nomial distribution of parameter v = (i, j)}i<i<4,i<j<w, , v{i,j) being the probability for the j-th 
element of a motif to be equal to i. 

In practice, it has been observed that approximating VQ{i,j) by the frequency of jumps from i to 
j in the (whole) sequence S is satisfying. It is assumed that the r.v. {v,po) are independent with 
prior distribution 117=1 x(^(")i)) and x'{Po)i x(^("5i)) is a Dirichlet distribution with parameters 
Lj = (ij,i,--- ,'-j,4) and x'iPo) is a Beta distribution with parameters (61,62)- i-j, 61 and 62 are 
assumed to be known. 

Therefore, given {v,po), {A,S) is a Markov chain described as follows: 

• If Ofe-i e {1, . . . ,w - 1} then Uk = at-i + 1; else P(afc = l|afe_.i G {0, ?i;},po, ^) = 1 -IP(afc = 

0|afc-i e {d,w},pQ,v) = Po- 
rn If Ofc = 0, Sk ~ vo{sk-i, •); else Sk is drawn from a Multinomial distribution with parameter 

v{-,ak). 

The chains are initialized with P(ai = l|po) = 1 ~ lP(fli = 0|po) = POi the distribution of si given 
ai = and v (resp. given ai = 1 and v) is uniform on {1, • • • ,4} (resp. a Multinomial distribution 
with parameter v{-,l)). 

This description yields to the following conditional distribution of A given S: (up to a multiplica- 
tive constant) - see [22] for similar derivation - 



P{A\S) oc 



T{Ni{A) + bi)T{No{A) + b2) yr nj=ir(c,,i(^) + 6,-,) 

FiNM) + mA) + 61 + 62) l{ r(Eti + 



L 




L 



X 



afc_ie{l,... 



k=2 



k=2 



where 



• A'^i(^) = i^{k,ak = 1} is the number of elements of A equal to 1. 

• A^o(^) = flfc = 0} is the number of elements of A equal to 0. 
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lai,=i^Sk=j is the number of pairs (ak^Sk) equal to (i, j). 
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Figure 5. Results given by AEE, I-MCMC and a MH sampler 



To highlight the major role of the equi-energy jumps, and the importance of the construction 
of the rings to make the acceptance probability of the jumps large enough, we compare AEE to 
I-MCMC, and to MH. The data are obtained with values of po,vo and v similar to those of [22]: 
Po = 0.005, bi = 2, 62 = 200, Lj^i = 1 for all j, i, and 



^0 



/ 0.1 
0.1 
0.1 
\ 0.7 0.1 



0.7 0.1 0.1 \ 

0.1 0.7 0.1 

0.1 0.1 0.7 
0.1 



0.1 J 



( 0.5 0.6 0.2 0.4 0.1 0.3 0.6 0.1 0.4 0.4 0.3 \ 

0.2 0.2 0.8 0.7 0.9 0.2 0.3 

0.8 0.3 0.5 0.4 0.1 

\ 0.5 0.2 0.4 0.1 0.4 0.3 0.1 0.1 0.6 / 



We sample a sequence S of length L = 2000 and the size of the motif is w = 12. 

We now detail how the MH and the Metropolis-Hastings steps of AEE and I-MCMC are run. For 
the Metropolis-Hastings stage, the proposal distribution p{An, An+i) is of the form 



L-l 



where we set 



n+l 



xn+1 



=n+l^ 



, a 



. The proposed state An+i of the Metropolis-Hastings step 
is then sampled element by element; the distributions are designed to be close to the previous 
model: Qj+i equal to a"^^ + 1 if a"^^ S {1, . . . , w — 1}, and else, is sampled under a Bernoulli 
distribution of parameter 



_i ; (15) 

PoU7=i^AAsj+i-i,i) + (1 -Po)nr=i vo{sj+i,Sj+i+i) 

the replacement constant po is fixed by the users and va„ is given by ?)a„(s,0 ^ Cs^i{An) + c- where 
c is a value fixed by the users. (?o(«i'''^) is the Bernoulli distribution with parameter (fT5]l . Finally, 
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the candidate An+i is accepted with probabihty 

P(i„+i|5)i/^'=p(i„+i,yL„) 

Figure O displays the results obtained by AEE, I-MCMC and a MH sampler. Each subplot 
displays two horizontal lines with length equal to the length of the observed DNA sequence. The 
upper line represents the actual localization of the motifs, and the lower line represents in gray-scale 
the probability for each position to be part of a motif computed by one run of each algorithm after 
2000 iterations. For AEE and I-MCMC, we choose e = 0.1, K = 5, S = 3. The acceptance rate 
of the jump for AEE was about five times higher than for I-MCMC, which confirms the interest 
of the rings. As expected, AEE performs better than the other algorithms: there were 13 actual 
motifs, and AEE retrieved 10 motifs, whereas the I-MCMC and the MH retrieved respectively 7 
and 6 motifs. 

5. Conclusion 

As illustrated by the numerical examples, the efficiency of EE depends upon the choice of the 
energy rings. The adaptation we proposed improves this efficiency since it makes the probability 
of accepting a jump more stable. It is known that adaptation can destroy the convergence of the 
samplers: we proved that AEE converges under quite general conditions on the adapted bounds 
and these general conditions can be used to prove the convergence of AEE when applied with other 
adaptation strategies |35j . It is also the first convergence result for an interacting MCMC algorithm 
including a selection mechanism. Our sketch of proof can be a basis for the proof of other interacting 
MCMC such as the SIMCMC algorithm of [iSj, the Non-Linear MCMC algorithms described in P 
Section 3] or the PTEEM algorithm of [10]. 



Appendix A. Results on the transition kernels Pg''^ 

Define 

Ge{x) = / gg{x,z)e{dz) , 9{x,dy) = — — . 16 

A.l. Proof of Proposition 13. 2L The case A; = 1 is a consequence of Iil2] since P^^^ = P^^'^ for any 

6 so that TTg^'* oc vr^i. We now consider the case k E {2, • • • ,K}: in the proof below, for ease of 

notations we will write P, Pg, W, X,b and ttq instead of P*^'^), Pg^\ Wk, Xk,bk and iTg'^^ . 

(a) Let m > 1 and 9 G Qm- By definition of gg (see ([2])) and of @m (see 1/m < 

J gg{x,y)9{dy) < S. Moreover, by EQEI 

PqW{x) = (1 - e)PW{x) + eKeW{x) < (1 - e){\W{x) + b)+ eKgW{x) . 

We have bv p. (fTUD and (fTUD 



) = W{x) + I W{y)ae{x,y) (l - ^^^) 9{x,dy) . 



By 



KgW{x 
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Defining ip by tl^{cr) = a/{a + gives the upper bound sup2g[o,i] z{l — z'^) < iP{(t). Hence, 

KeW{x) < W{x) + Smip (rk^k/Wk - Pk-i)) 0{W). This yields PeW{x) < XW{x) + bme{W) with 
A = (1 — e)A + e < 1 and b = eS ip {Tk(3k/{(3k — (3k-i)) + (1 — £)b- The minorization condition comes 
from the lower bound Pg(x,A) > (1 — e)P{x,A). 

(b) Let m > 1 and 9 £ Qm- By E l2fet P is (/9-irreducibIe and so is Pg; Pg possesses a 1-small set and 

is thus aperiodic. In addition, PgW < (1 + \)W/2 + be{W)l{w<c}, with c =^ 26m e{W){l - X)'^ 
and {W < c} is a l-smaU set for Pq. By [281 Chapter 15], irg exists and 7rg{W) < bm9{W)(l - X)-^. 

A. 2. Ergodic behavior. 

Lemma A.l. Assume i lJbl and iJB Then for all a G (0, 1), for all m > 1 and all 6 £ Qm, there 
exist Cq and pg E (0, 1) such that for all x G'X. and any j > 1 and any k G {1, • • • , K}, 

II (Pf )' (x, .) - nH'^Wwi: < Co 4 W^{x) . (17) 

Let k G {1, • • • ,K — 1} and assume in addition that lim„-i.cxD 9n\Wk) = 0i^\Wk) w.p.l. Then for 
any positive integer q, on the set r\n>q{^n G ©m*} 

limsup/9 (fc) < l,limsupC (ft) < +oo,P— a.s. . (18) 

n n 

Proof. The proof in the case A; = 1 is a consequence of El2] and j28|. Chapter 15] since P^^"* = P*^^) . 

(k) 

Consider the case k>2. Here again, the dependence upon k is omitted: Pg,W,6n denote Pg ,Wk 
and ei^\ 

Proof of |i7p Let a G (0, 1) and set V = W"". By the Jensen's inequality and Proposition 13.21 
there exists AG (0, 1) and b such that for any m > 1 and any 6 G Gm, 

PeV < XV + bme{W)'' . 

Let m > 1 and 9 G &m- By [16' Lemma 2.3.], ()17p holds and there exist constants C, 7 > such 
that for any 9 G 0m 1 

Cg\/ {1- pey^ <C{b m9{W) V 6^^ V (1 - A)"^)^ , 

where 5g is the minorizing constant of Pg on the set {x : W{x) < 2bm9{W) (1 — A)"^ — 1}. 
Proof of m For ah u G f]n>g{^n e e^J, 

limsup{C0,^(^^) V (1 - pe„(i^))^n <c(bm limsup6'„(VF) V limswp5~\. V (1 - A) 

n \ n n " 

Since limsup„ 0„(VF) = 9^{W) < oo w.p.l, limsup„ (^g^^j.^-^ < oo w.p.l. thus showing that on the set 
r\n>q{^n £ ©mj, lim sup„{Ce„(^) V (1 - Pe„(a;))"H < OO. This implies ([18]). □ 

A. 3. Moment conditions. Let m^, > 0. Define for any positive integer q and any k G {1, • • • ,K — 
1}, 

ylW = Pi Pi jej^) G e„,} if g < n, and A'-^I = n otherwise; 

£<k q<j<n 

by convention, A^q)i = Q. for any n > 0. 
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Lemma A. 2. Assume i lJfal f^nd E WkiY^''^) < oo for any k G {I,-"" ^K}. Then for any 



supE 
i>i 



< OO 



(19) 



Proof. The proof is by induction on k. The case /c = 1 is a consequence of El2] since 

P^^^ =PW. As- 
sume the property holds for A; G {2, • • • , K-1}. In this proof, Wk+i, P^''^^\ei^\Y^''\Y^''+^\ P^''+^\ 



(fc+i) 



will be denoted by W, Pe,9n, Y^ X, P, Kq. 



By (l6|) and Proposition 13.21 we obtain, for j > q 



E 



< E 

< AE 

< AE 



Pe^_,W{X,.^)l., 



) 

+ 6m^E 



e,_i(T^)l,(fe-i) 



4{fc) 



+ h nii, sup E 



W{Yi)l^ 



Since VFfc+i £ , the induction assumption implies that sup; E 
this inequality allows to write that for some constant C 

<C'E [WiXg)] 



< OO. Iterating 



supE 



W{Xj)l 



Finally, by definition of Pq. , either Pg^ = P ii 6j ^ ©„, or Pq. = (1 — e)P + sKq. otherwise; note 
that if 6j £ \Jrn®rn then 6j £ ©i/j- Since both P and Pq for 9 £ Um ®m satisfy a drift inequality 
(see EE] and Proposition E^]), E [VF(X<y)] < oo by ([6]). □ 



Appendix B. Proof of Theorem 13.31 
Rl (k) There exists > such that P (^Uq>i r\n>q{^n^ ^ ©^J 
R2 (k) for any a G (0, 1) and any continuous function / G ^W^, 

i'\f) ■ 
'f{Yi'^: 



1. 



hm vr(fLi)(/) 



R3 {k) For all bounded continuous function /, lim^ 



,E 



5(1) 



(/)• 



R4 {k) 6*1 {Wk+i) < +00, and for any a G (0, A 1) and any continuous function / in Cw°-, 



By Proposition 13. H the conditions E[3] and Ed] hold for k = 1; E[2] also holds for k = 1 since 
TTg^^ = ^i^'' for any 9. We assume that for any j < k, for k £ {!,■ ■ ■ ,K — 1}, the conditions H^j — 1), 
HSI^i) and Eil^j) hold. We prove that E^fc), ^k + 1), I^fc + 1) and EtU^A; + 1) hold. To 
make the notations easier, the superscript k is dropped from the notations: the auxiliary process 
win be denoted by Y, and the process F^'^+i' by X; pC^+i), TV^+i, 0^^"+^^ 4''+^^ 

(k) (k) 

and On ,9\ are resp. denoted by P, Kg,Pg,ag,TTg and 9n,0-^. 
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Finally, we define the V- variation of the two kernels Pg and Pg' by: 

\Pe{x,.)-Pg,{x,.)\\v 



Dv{e,e') = sup 



Vix) 



When V = 1, we will simply write D. 

B.l. Proof of R[T](/c). The proof is prefaced with a preliminary lemma. 
Lemma B.l. For all I £ {1, • • • , 5" — 1} and any 6, 9' , 



1 



sup \ he,i{x) - he'^i{x) \ < - sup - 



Proof. Note that |(1 — a)+ — (1 — < |6 — a|. Therefore, for all x G X : 

\d{7ru{x),He,i) - d{'Ku{x),H0f^i)\ 



\hg^l{x) - hg'^i{x)\ < 

I 

This concludes the proof. □ 
(Proof of I{l^k)) We prove there exist an integer > 1 and a positive r.v. N such that 

P(Ar < cx)) = 1 , P I fl jinf / geAx,y)Onidy) > 

\n>N ^ ^ 

To that goal, we prove that with probability 1, for all n large enough, 

inf / ge„{x,y)en{dy) > inf / he^^v) (^Mv) , (20) 

and use the assumption E l3fel For all x and 6, there exists a ring index l^^e G {1) " " " , S} such that 
TTuix) e Hg^i^ g. Upon noting that diiTuix), Hg^i^^^) = 0, it holds 

liminfinf / ge„{x,y)9 n{d.y) > lim inf inf / i[y)9n{d.y) . 

n X J n ig{i,...,s}y 



hea{y)9n{dy)> j he^,i{y)Gn{dy) - j - ^n(dy) 



We write 



> / ^e*,/(y)6'n(dy) - sup - . 

By definition of /le^/, y i— )• hg^^i{y) is continuous and bounded. Therefore, by E[l|^k), Lemma IB. II 
and E 13E1 the proof of (i20]l is concluded by 



lim^inf J he^,i{y)9n{dy) > j he,,i{y)9^{dy) . 

B.2. Proof of r{[2](A; + 1). First of all, observe that by definition of tiq (see Proposition 13. 2p and 
the expression of P^i, ttq^ oc vr'^'^+i. We check the conditions of [16, Theorem 2.11]. By Proposition [a] 
it is sufficient to prove that for any g > 1, lim„^oo |7i"e„(/) - 7r0^(/)|ln > {^jee^j = w.p.l. 
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Case / bounded. Lemma [A. II and ESU^/c) show that on the set r\j>q{(^j ^ ©m*}) limsup^Cg^ < oo 
and Hmsup„(l— < oo w.p.l. Equicontinuity of the class {Pef, S ©m*}, where / is a bounded 
continuous function on X, wih follow from Lemmas IB. 21 to IB.4I Finally, the weak convergence of 
the transition kernels is proved in Lemma lB.51 

Case / unbounded. Following the same lines as in the proof of jlGJ Theorem 3.5], it can be proved 
that the above discussion for / bounded and Proposition I3.2lfb]l imply 

- 7re,(/)}ln.>^{0^.ee™.} = 

w.p.l. for any continuous function / such that |/|vi^^^^ < oo. 

Lemma B.2. For all 9 G [j„^Qm, and x,x', supy \gg{x,y) - gg{x',y)\ < f |vr(x) - tt{x')\. 
Proof. By 

S S 

\geix,y) - ge{x',y)\ < ^ \he,i{x) - h0^i{x')\he,i{y) < ^ \he,i{x) - he,i{x')\ . 
1=1 1=1 

The proof is completed since 

. ^ , , \d{7^{x),He,i)-d{^{x'),Hg^i)\ ^ \7t{x)-Ax')\ 
\he,i{x) - he^i{x )\ < < . 

□ 

Lemma B.3. Assume ffl^ For all m> 1, there exists a constant Cm such that for all x, x', y,y' £ X. 
and 9 £ @m 

\a0{x,y) - ag{x',y)\ <Cm \ Tr'^''~^''+'ix) - TT'^''~^'=+''ix') + |7r(x) - 7r(x')|] , (21) 

\ae{x,y)-ae{x,y')\ < C„ [|7r^'=~'5^-+i(y) - 7r^'=-'^'=+i(y')| + |vr(y) - • (22) 

Proof. By definition of (see (jH), aQ{x,y) — ag{x',y) = (1 A a) — (1 A b), with 

/ ge{x,z)9{dz) ^^^^ 7r^w-/3>c(^) f ge{x',z)9{dz) 

TiPk+\-lik{^x) J g0{y,z)9{dz) vr/^fc+i-Z^fc (x') J g0{y,z)9{dz) 

Note that |(1 A a) — (1 A b)\ < \a — b\ (la<i + lfe<i,a>i)- By symmetry, we can assume that 6 < 1 
and this implies 

^0k+i-^k^y) ^ f gQ{y,z)9{dz) 



- f gg{x',z)e{dz) 



< Sm 



since gg{x,y) < S. Therefore, 



\a-b\ 



TT 



(y) 



f g0{y,z)9{dz) 



Jgg{x,z)9{dz) Jgg{x',z)9{dz) 



IT' 



A+i-/3fc(2;') 



< Sm 



Pk+l—Pk(^y^ Tlf^k—f^k+l (^j.'j _ ^l^k—Pk+l(^r^''^ 



m 



{ge{x,z) - ge{x',z))9{dz) 



The proof of (|2ip is concluded by Lemma [B.2l The proof of (j22p is on the same lines and omitted. □ 

Lemma B.4. Assume and . flljal For any m > 1 and for any continuous bounded function f, 
the class of functions {Pef,9 G ©m} is equicontinuous. 
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Proof. Let / be a continuous function on X, bounded by 1. Let m > 1 and 9 G 0m . We have 
Pefix) - Pefix') =(1 - e) [Pf{x) - Pf{x')) + e (/(x) - f{x')) {l- j ae{x\ y)e{x, dy) 
+ ^ j {f{y)-f{x)){ae{x,y)-a0{x',y)) e{x,dy) 
+ e [ ae{x\y){f{y)- f{x')){e{xAy)-hx'Ay)) , 



where 6 is given by (|16p . This yields to 

\Pef{x) - Pef{x')\ < \Pfix) - Pf{x')\ + - /(xO| 



+ 2sup \ag{x,y) - ae{x',y)\ + 2 e{x, .) - 0{x', .] 
y 



TV 



We have 



~ ~ 1 S 

\\e{x,.) - 6'(x',.)IItv < -prT^^^p\9e{x,y) -g0{x',y)\ + \Ge{x) - Gg{x')\ 

Gg{x) y Ge{x)Gg{x') 

< msup\geix,y) -ge{x',y) \ + Sm^ sup \ge{x,y) -ge{x',y)\ , 
y y 

where Gg is given by (jl6p . So Lemmas IB. 21 and IB.3I imply that for all m > 1 , there exists a constant 
Cm such that for all 9 £ Qm- 

\Pefix) - Pef{x')\ < \Pfix) - Pfix')\ + - 

+ Cm(K(2;)-^(x')| + K^''-"^'=+H2;)-vr^'"^'+H3;')l) • (23) 

The proof is concluded since P is Feller and vr is continuous. □ 

Lemma B.5. Let m > 1. Assume iJ21 o^nd I^^k). For all a; G X, there exists a set Clx such 
that = 1 for all uj £ il.x o,nd any hounded continuous function f 

i™o - PeJi^)\ la{e,ee,„} = . 

Proof. Following the same lines as in the proof of |16l Proposition 3.3.], it is sufficient to prove that 
for any x G X and any bounded continuous function /, lim„^oo -P6»„(/) = PeAf) w.p.l on the set 
[\j{9j G ©m}- Let / and x be fixed. We write 

Pef{x) - Pe'f{x) =e J {ag{x, y) - ae/(x, y)) {f{y) - f{x)) e{x, dy) 

+ e j ae^ {x, y) {f{y) - f{x)) (e{x, dy) - e'{x, dy)) , (24) 

where 6 is given by (116p . Moreover, 

2/ A \ au A \ g9{x,y)9{dy) - gg>{x,y)e'{dy) (Gg'jx) - Ggjx)) 

9{x, dy) - 9'{x, dy) = — — + gg> x, y)e [dy) ^ . . . • 

Ge[x) Ge{x)Gg'[x) 
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This yields to 

e-'{PeJ{x)-PeJ{x)) = J {ae„{x,y) - aeAx,y)) {f{y)-f{x))Ux,dy) 



-Fix,y) (e^dy) - Onidy)) 



- / ( [g,^{x,y)-geAx,y))^^^,+geAx,y)0.{dy) ^^"^"^^^ 



/ 



where F{x,y) = ag^{x,y) {f{y) — f{x)). There exists a constant Cm such that on the set Hni^" ^ 
0m}, (see the proof of Lemma [6.31 for similar upper bounds) 



\'^er^ix,y) - agAx,y)\ < C„ 



GeAy) GeAy) 

< m^SCrn (IGeJx) - GeA^)\ + \GeAy) " GeAy)\) 
where Ge{x) is defined by ([T6]). We write by definition of the function gg (see ([2])) 



sup I Ge^ (x) - Ge^ (x) | < sup \ge„ {x,z) - ge, {x,z)\+ sup 

X 

S 



x,z 
S 



geAx,z)en{dz) - / ge^x, z)e^{dz) 



<2^sup\he^Az)-he,Az)\ + Y, [ he^A^Wnidz) - f he,,iiz)OAdz) . 
1=1 ^ 1=1 

By Lemma iB.ll and B^b) the first term converges to zero w.p.l. Since t i— )■ hg^^i{t) is continuous, 
RlD^k) implies that the second term tends to zero w.p.l. Therefore, on the set Cl^iOn £ ©m}, 
sup^ y\a0^{x,y) — ag^{x,y)\ converges to zero w.p.l, as well as sup^ ,^ [^^^(x, y) — gQ^{x,y)\, and 
sup^ \GeJx) - GgAx)\ . 

Note that by Lemma IB. 31 y i— t- F{x, y) is bounded and continuous. Therefore, following the 
same lines as above, it can be proved that under ESI^k) and UjH on the set flni^" ^ ©m}, 
lim„^oo I / F{x, y)9n{x, dy) - f F{x, y)6Ax, dy)\ =0 w.p.l □ 

B.3. Proof of R[3](k+1). We check the conditions of Theorem 2.1]. Let / be a bounded 
continuous function on X. By I^k+l), lim„_j.oo vre^ (/) = 7re^,{f) oc 7r'^'=+i w.p.l. Let 6 > 0. By 
Proposition m there exists q > I such that IP(nn>g{^" ^ ©m*}) > I — 6. Following the same lines 
as in the proof of jl6t Theorem 3.4], it can be proved by using Lemmas lA.ll IB.ll and IB. 61 and the 
condition ElHblthat lim„^ooE (/(X„) — TTe„{f)) lf| ^ {e^Ge™^} = 0- This concludes the proof. 

Lemma B.6. For all m > 1, there exists a constant Cm such that for any 9, 6' G Qm, 

D{e,e')<Cmne-e'\\^^j + snp\heAx)-he^A^)\ \ . 



Proof. By definition of Pe, for all function / bounded by 1, (p4l) holds. So 
D{9,9') < 2esup\a0{x,y) - a0>{x,y)\ 



x,y 



+ 2eSm'^ ( sup \ge{x,y) - ge'{x,y)\ + \\0 - 9'\\ty + sup |G0/(x) - G0{x)\ ) 

\x,y X J 
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The term \ag{x, y) — ag'{x, y)\ is equal to |1 A a — 1 A 6| with 

^^ 7r^'=+'-f''={y)fge{x,z)e{dz) ^ _ TT^'^+^-P^jy) f ge'jx, z)e'{dz) 

7r/^k+i-l3k(^x) J gg{y,z)6{dz) TT'^''+i-Pk(^x) J gg>{y, z)6'{dz) ' 

Note that |1 A a — 1 A 6| < jf* — a| (l{;,<i,a>i} + la<i) • Therefore, for all 9, 6' € ©m, 
sup|Qe(2;,y) - aei{x,y)\ < S'^rn^ ( sup \ge{x,y) - ge'{x,y)\ + \\e - 9'\\ty 

x,y \ x,y 

The term \Ge'{x) — Gg{x)\ is upper bounded by 



3' I 



\Ge'{x) - Gg{x)\ < sup\ge{x,y) - ge'{x,y)\ + S\\b - a htv 



Moreover, 

\ge{x,y) - ge'{x,y)\ = ^[he^i{x)he^i{y) - hei,i{x)he'^i{y)] 

1=1 

This concludes the proof. □ 



< 2S sup \he,i{x) - he'^i{x) 

l,x 



B.4. Proof of V^k + 1). Let a G (0, ^ A 1) and set V = W. We check the conditions of [Tg 
Theorem 2.7]. By Proposition O condition A3 of [IGj holds. By Y^k + l), lim„^oo vre„(/) = ttoM) 
w.p.l for any continuous function / in Cw^- Condition A4 (resp. A5) of [16] is proved in Lemma IB. 71 
(resp. Lemma iB.Sp . 

Lemma B.7. Assume ^ ^ ^ ^W^)' "^'^ IE[M^j(yo^^'^)] < oo for all j < k. Then for any 

aG (0, Al) 

Y^j-\Le^ V Le._,fDw4<^j,ej.i)W^{Xj) < oo ¥ - a.s., 
where Lg = Gq V {1 — pe)^^ ■ 

Proof. By R[l{j) for all j < k, it is sufficient to prove that for any positive integer q 

y^j-\Le^y Le^_,fDv{ej,ej^i)V{Xj) 1 « < oo P - a.s. 
i>i 

where A^'^j is defined in Appendix IA.3I Following the same lines as in the proof of Lemma IA.21 it 
can be proved that Ej=ii~H^ej ^ Le^_^fDv{9j,6j_i)V{Xj) < oo w.p.l. 

By Lemma [A. II and HHk), on the set {^i>q{Oi G ©m*}, limsup^Lg^ < oo w.p.l. Therefore, we 
have to prove that '^,^^j~^Dv{6j,9j-.i)V{Xj)l (k) < oo w.p.l. Following the same lines as in the 

proof of Lemma IB. 61 we obtain that on the set ^4^^^- , there exists a constant Gm such that 



Dv{9j,ej^i) < G,n (^sup \^e„i - ^e,^,,i\ + \\0j - ^j-iIItv) {\\Oj\\v + \\Oj-i\\v) 



+ Cm \\d 



m Irj — '^j-l\\V 
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Set s, 7, such that s = 1 V (2a) <l + 7<l + r. By E l3bt there exists a r.v. Z finite w.p.l such that 



:"-a.s. 



Therefore, it holds 



(fc) 



E 



9,l|y + ll^,-i||y)^V^^(^,)l 



We have, 



< 2(7(7) supE 



^j\\v)' l,(fe) 



1/2 



supE 

j 



V{X,)sl 



(fc) 



1/2 



where C{-f) = J2j>q (i^"^"^)/' + i"^/') is finite since 2/s > 1 and 1 + 7 > s. Since F^/^ < 



Lemma I A . 2 1 imphes that sup^- E 
inequahty. 



W(X,)1 



(fc) 



< 00. In addition, since 2/s > 1 we have, by Jensen's 



E 



ll^illfV 

1,3 



< E 



(k-i) 



< E 



i^y§(y,)i^ 



(fc-i) 



p=i 



< supE 



VF(y,)i 



which is finite under Lemma [A.2i Similarly, we prove that J2j>qj ^W^j ~ 9j-i\\vV{Xj)^j^(k) < 
w.p.l, upon noting that \\0j - 9j-i\\v < r'^{V{Yj) + %_i(y)). 



00 



□ 



Lemma B.8. Assume iJTl 
aG (0,1), 



I^k), lU^j) and E[Wj{Y^^^)] < 00 for all j < k. For any 



a.s. 



Proof. By B[T]^j) for all j < k, it is sufficient to prove that for any positive integer q 

■ -l/aj-^/a 



Y^.-xlaj2ia p^y^^x,) < 00 P - a.s. 



where J^^\ is defined in Appendix I A. 31 Let g > 1. By Lemma [A. 11 sup^Lgl (k) < 00 w.p.l; and. 



as in the proof of Lemma IA.21 it can be proved that sup^ E 
concluded since A;^^/" < 00. 



< 00. The proof is 

□ 



ADAPTIVE EQUI-ENERGY SAMPLER: CONVERGENCE AND ILLUSTRATION 



25 



B.5. Proof of Proposition [3741 The proof uses a Hoeffding inequality for (non-stationary) Markov 
chains. The following result is proved in [15 j section 5.2, theorem 17]. 

Proposition B.9. Let (yfc)fcgN be a Markov chain on (K,X), with transition kernel Q and initial 
distribution rj. Assume Q is W -uniformly ergodic, and denote by 6^ its unique invariant distribution. 
Then there exists a constant K such that for any i > and for any hounded function / : X — )• M 

' " \ _ r 1 / +2 

fiXi) - nO^if) >t\< Kr^iW) exp 



.1=1 



A 



K V"I/IL I/I 



Lemma B.IO. Assume that there exists W such that {Yn,n > 0} is a W -uniformly ergodic Markov 
chain with initial distribution rj with rj{W) < oo. Let I £ {I,-"" ) — 1} and pi € (0,1); and set 
= FZ^{pi). For all e > and any n> 1, 



P {\Ce^,i - 61 > e) < 2Kr]{W) exp (5^ a 6, 

where = min{Fe^{ii + e) - pi,pi - Fg^{£,i - e)}. 

Proof Let e > 0. We write P (|6„,i - > e) < ^ > 6 + e)+P <^i-e). Since Fe„(x) < t 

mx<F,-\t), 

P {^9^,1 > 6 + e) = P (i^,:'(Pz) >Ci + e)=r{pi> Fg^ (6 + 6)) 

= ^p(ElW(n.)>e,+e}>n(l-pz)^ . 

Proposition IB. 91 is then applied with f{x) = l{7r„(x)>5,+e}- As 

e,{f) = j l|,„(,)>5,+,}0.(dx) = 1 - FeMi + e) , 



we obtain 



P > 6 + e) = P 5^ f{Yk) - ne.if) > n {Fg^^i + e) - pi) 



\k=l / 

< KriiW) exp (--| [{Fg^i^i + e) - pi)^ A {Fe^Ci + e) - pi] 



for some constant K independent of n,l,e. Similarly, 

n 



K 



¥{^0^,i<^i-e)<Kr,{W) exp 
which concludes the proof. 

Proof of Proposition \3.4\ Let fg^ = Fg and e„ be defined by 

2^/2 rr? /log(n) 



{pi - Fe. (6 - e)) A {pi - Fg^ (6 - e)) 



□ 



where K is given by Lemma [B.lOi Note that under (jl]), fg^{S,i) > since pi G (0, 1). By Fg^ is 
differentiable and we write when n — )• oo 

FgACi + en) - M = + ^n) - i^e.(6) = feA^iK + o(en) • 



26 



AMANDINE SCHRECK, GERSENDE FORT AND ERIC MOULINES 



Hence Fq^ (6 + e^) -pi> V2K^ ^ for n large enough. Similarly, pi — Fq^ {^i — e„) > V^kJ 

for n large enough. So when n is large enough, nK^^ (5l^ ^^tn) ^ 21og(n) with 5^ defined in 

Lemma iB.lOi By Lemma IB. 101 for n large enough, to 

^ \\^e„,i - qI > enj < — • 

As X^^i P (|C9„,z ~ 61 > Cn) < oo, the Borel-Cantelli lemma yields limsup„ e~-^ — 61 < oo 
w.p.l. This concludes the proof. 
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